.TH OCAMLBURG 1 "$ Date: $"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.hy 0
.if n .na
.\"
.SH NAME
ocamlburg \- Burg code-generator generator for Objective Caml
.SH SYNOPSIS
\fBocamlburg\fP \fIspec.mlb\fP 
.SH DESCRIPTION
\fBocamlburg\fP generates a tree-matching algorithm from a Burg
specification \fIspec.mlb\fP and emits it to stdout.  The generated code
implements a dynamic programming algorithm to cover a tree structure
with minimal cost. \fBocamlburg\fP is inspired by David Hanson's iburg(1)
for C: Fraser and Hanson: "Engineering a Simple, Efficient Code
Generator Generator", ACM Letters on Programming Languages and Systems
1, 3 (Sep 1992), 213-226.
.SH OPTIONS
.TP
\fB-version\fP
Prints some version information to stdout.
.TP
\fB-help\fP
Prints a short options summary to stdout.
.TP
\fB-norm\fP \fIspec.mlb\fP
This is an option for debugging \fBocamlburg\fP. \fBocamlburg\fP normalizes
rules during the translation process. In a normalized rule, no
constructor pattern has another constructor pattern as argument. When
the \fB-norm\fP flag is present, \fBocamlburg\fP emits the normalized rule set
to stdout.
.TP
\fB-twelf\fP \fIspec.mlb\fP
This option prints Twelf code to standard output.  Loading the output
into a Twelf server will check if the Burg rules cover all cases.  The
Twelf server will provide examples to illustrate any expression that
cannot be matched. (This is work in progress)
.SH BURG SPECIFICATION
.B SYNTAX
.PP
\fBocamlburg\fP accepts a Burg specification that conforms to the following
EBNF grammar. In an EBNF grammar, \fB|\fP separates alternatives, curly
braces \fB{ .. }\fP denote repletion, and square brackets \fB[ .. ]\fP optional
parts. Terminals in the grammar are enclosed in double quotes, all other
symbols denote nonterminals.  Various parts of the grammar are discussed
below.
.nf
    spec    :   { decl } "%%" { rule }

    decl    :   "%term" { ident }
            |   "%head" code
            |   "%tail" code
            |   "%type" code

    rule    :   ident ":" pattern [ cost ] code

    code    :   "{:" camlcode ":}"
    cost    :   "[" number | code "]"
    
    pattern :   number
            |   """ string """
            |   "'" char "'"
            |   ident "(" pattern { "," pattern } ")"
            |   ident "(" ")"
            |   ident [ ":" ident ]
.fi
In addition to the grammar above, a rule pattern must not be a literal
number, literal string, or a terminal variable. Those can only appear as
constructor arguments like in \fBInt(23)\fP. 
.PP
Identifiers and numbers in a \fBocamlburg\fP specification match the
following regular expressions.
.nf
    alpha       = ['a'-'z' 'A'-'Z']
    digit       = ['0'-'9']
    ident       = alpha(digit|alpha|'_')*
    number      = digit+
.fi
A \fIstring\fP is enclosed in double quotes, a \fInumber\fP is a non-negative
integer, a 
.I char
is enclosed in single quotes. Use escapes for newline
\fB\en\fP, return \fB\er\fP, tabs \fB\et\fP, and single quotes \fB\e'\fP in character
constants.  \fICamlcode\fP is arbitrary code that may include properly
nested pairs of \fB{:\fP and \fB:}\fP.  Comments start with two hyphens and go
to the end of the line.  
.PP
Certain identifier are keywords and cannot be used as variables or
nonterminals: \fBstart\fP, \fBterm\fP, \fBtype\fP, \fBhead\fP, \fBtail\fP. Three
terminal types are predefined: \fBint\fP, \fBstring\fP, and \fBchar\fP. They
correspond to number, string, and character literals.
.PP
Here is an example for a specification:
.nf
    -- 
    -- sample.mlb
    --

    %head {: (* generated from sample.mlb, do not edit *) }
    %term int string        -- terminal type must be declared
    %type number {: int :}  -- type for nonterminal is optional
    %type str    {: string :}
    
    %%

    number : ADD(x:number,  y:number)     [2]    {: x + y :}
    number : ADD(x:number,  NULL())       [1]    {: x     :}
    number : ADD(x:number,  ADD(NULL(), z:number))  [1] {: x + z :}
    number : SUB(n:number, m:number)             {: n-m :}
    number : MUL(n:number, m:number)             {: n*m :}
    number : DIV(n:number, m:number)    
            {: if m = 0 then assert false else n/m :}
    number : CONST(x: int)                [1]    {: x :}
    number : CONST(0)                     [0]    {: 0 :}

    str    : STR(x: string)               [1]    {: x :}
    str    : CONS(x: string, y:string)    [2]    {: x ^ y :}

    -- chain rules
    str    : number                [1]    {: string_of_int number :}
    number : str                   [1]    {: int_of_string str    :}
.fi
.PP
.B DECLARATIONS
.PP
Code associated with a \fB%head\fP declaration is copied verbatim to the
output \fIbefore\fP the code generated for rules, code from a \fB%tail\fP
declaration goes \fIafter\fP code for rules. This code is used to declare
type and values used by user-provided code in rules. 
.PP
A \fB%term\fP declaration lists Objective Caml types used for terminal
variables in rules. In the example above, \fBx:number\fP is a nonterminal
variable, and \fBx:string\fP is a terminal variable, because \fBstring\fP was
declared as a terminal type.
.PP
A \fB%type\fP declaration declares the Objective Caml type of a
nonterminal. For instance, the code above declares \fBnonterm\fP to produce
an \fBint\fP value, and \fBstr\fP to produce a \fBstring\fP. Because the declared
type may be complex, it is delimited by the same braces that are used
for code. Type declarations for nonterminals are optional.
.PP
.B RULES
.PP
A rule defines a nonterminal (type) that is to the left of the colon,
has a pattern, an optional non-negative cost, and a user-provided action. 
A pattern is similar to a pattern in Objective Caml: it can be a literal
value, a terminal or nonterminal variable, or have a constructor with
more patterns as arguments. When a rule is selected and executed at
run-time, it computes a value of the nonterminal of its left hand side.
.PP
While not enforced, it is a useful convention to capitalize constructors
in patterns like \fBNil()\fP, and to keep terminal and non-terminal
variables like \fBnumber\fP in lowercase.  Beware when using constructors
without arguments: \fBNil\fP is a variable, \fBNil()\fP a constructor. 
.nf
    -- rule with cost and action
    number : ADD(x:number,  y:number)     [2]    {: x + y :}

    -- chain rules
    str    : number                [1]    {: string_of_int number :}
    number : str                   [1]    {: int_of_string str    :}
.fi
A variable \fIx\fP without a type annotation stands for a variable
\fIx\fP\fB:\fP\fIx\fP. This often allows to omit types from variables if the
types of variables in a pattern are distinct.  
The two so-called chain rules at the end of the specification are an
example: the \fBnumber\fP variable in the first rule stands for
\fBnumber:number\fP and thus matches a \fBnumber\fP nonterminal value. 
.PP
The terminal and nonterminal variables of a rule are in scope of the
actions. In the example above the action refers to \fBx\fP and \fBy\fP, which
are defined in the pattern.
.PP
When a variable is referenced from Objective Caml code in an action, it
must follow the Objective Caml syntax for variables. For example, you
cannot use [[Letter]] as a variable, because variables in Objetive Caml
must start with a lowercase letter. A constructor should likewise be a
legal name in Objective Caml.
.PP
Chain rules are rules that have only a non-terminal variable as pattern.
They provide conversions between nonterminal values: a \fBnumber\fP
nonterminal value can be converted into a \fBstr\fP nonterminal value at
cost one by the first rule. The two rules are recursive but the
associated costs of one prevent that they are applied indefinitely.
.PP
.B COSTS
.PP
A Rule has an associated non-negative cost that is computed at run time.
The cost of a rule is the sum of the costs of its arguments and its
explicitly specified cost. If a rule has no explicit cost, it defaults
to zero.  The rule's cost specification is either static, or dynamic.  A
static cost is a non-negative integer; a dynamic cost is an expression
that is evaluated at run-time. The values of terminal variables of
\fIunnested\fP patterns are available in the cost expression.  Thus, the
cost of a rule can depend on constructor arguments.
.PP
In the example below, the first cost is dynamic, the second static. A
dynamic cost expression is enclosed in \fB{:\fP and \fB:}\fP, like other
literal OCaml code.
.nf
    str    : STR(x: string) [{: String.length x :}]     {: x :}
    str    : CONS(x: string, y:string)    [2]           {: x ^ y :}
.fi
Note that it is impossible to use a variable from a nested pattern in
the cost expression. However, such a variable \fIis\fP visible in the
action.  See the example below:
.nf
    t      : X(x:int, Y(y:int)) [{: y is invisible here :}] {: x + y :} 
.fi
.SH THE GENERATED CODE
The purpose of the generated code is to select the rules from a set that
match a (subject) tree at the smallest cost, according to the cost
annotations.  The generated code contains a function for every
constructor (the constructor name is prefixed with \fBcon\fP). For the
example above, these are:
.nf
    module Camlburg: sig
        type cost = int                      
        type 'a nt =                        
            { cost : cost
            ; action : unit -> 'a; 
            } 
        ...
    end

    type ('a, 'b, 'c, 'd) nonterm = 
        { _ADD2     : 'a;           (* private *)
        ; _NULL1    : 'b;           (* private *)
        ; number    : 'c;
        ; str       : 'd;
        } 


    type t =
        ( int       Camlburg.nt     (* private *)
        , unit      Camlburg.nt     (* private *) 
        , int       Camlburg.nt     (* for number nonterminal *)
        , string    Camlburg.nt     (* for str    nonterminal *)  
        ) nonterm 
        
    val conNULL  : unit -> t
    val conCONST : int -> t
    val conSTR   : string -> t
    val conCONS  : string -> string -> t
    val conDIV   : t -> t -> t   
    val conADD   : t -> t -> t   
    val conMUL   : t -> t -> t   
    val conSUB   : t -> t -> t   
.fi
.SH THE CLIENT
To find the cheapest cover for a subject tree, the client walks over the
subject tree and calls the appropriate function for the actual node: For
every pattern constructor \fIC\fP the generated code contains a function
\fBcon\fP\fIC\fP.  At a leave with an integer constant, it calls \fBconCONST\fP.
There are two rules for the \fBCONST\fP constructor, depending on the
integer constant the generated code will select the cheaper one and
return a value of type \fBt\fP. This value represents the user-provided
code of the selected rule and the associated cost.  If the client comes
to a node with an \fBADD\fP constructor, it calls \fBconADD\fP and passes
values as arguments that were returned by the above functions when the
child notes were visited. Finally, the client comes to the root node and
receives a final \fBt\fP value for it. This value represents the cheapest
cover for the subject tree. 
.PP
Typically, the root node of a subject tree is covered only by a single
nonterminal, for example the \fBnumber\fP nonterminal. When the \fBaction\fP
for this nonterminal is triggered, the user actions from the rules for
the cheapest cover are computed:
.nf
    ...
    let t = conADD(left,right) in
        t.number.Camlburg.action ()     (* an int value *)
.fi
The generated code thus has constructed a value \fBt\fP that represents the
cheapest set of rules (and therefore actions) that cover the walked
tree. The \fBaction\fP field of the finally returned value gives access to
the actions. In the example, the tree is some kind of expression tree
and the constructed action is an evaluation. Because of the chain rules,
tree that is covered by a \fBnumber\fP value is also covered by a \fBstr\fP
value. This allows to obtain a string value as well:
.nf
    ...
    let t = conADD(left,right) in
        t.str.Camlburg.action ()        (* an string value *)
.fi
.SH FILES
The generated code relies on the small module \fBCamlburg\fP that comes as
\fBcamlburg.mli\fP and \fBcamlburg.ml\fP with \fBocamlburg\fP. 
.SH AUTHORS
Christian Lindig <lindig@eecs.harvard.edu>, 
Norman Ramsey <nr@eecs.harvard.edu>,
Kevid Redwine <redwine@eecs.harvard.edu>.
.SH COPYING
This software is in the public domain.
.PP
THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR AND COPYRIGHT HOLDER BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
.SH VERSION
$Id: ocamlburg.1 8 2006-07-11 13:18:57Z lindig $
.SH SEE ALSO
ocaml(1), http://www.ocaml.org/
.br
Fraser and Hanson: "Engineering a Simple, Efficient Code Generator
Generator", ACM Letters on Programming Languages and Systems 1, 3 (Sep
1992), 213-226.
.br
http://www.cminusminus.org/. \fBocamlburg\fP is part of the Quick C--
compiler.

